Three Heads are Better than One

نویسندگان

  • Robert E. Frederking
  • Sergei Nirenburg
چکیده

Machine translation (MT) systems do not currently achieve optimal quality translation on free text, whatever translation method they employ. Our hypothesis is that the quality of MT will improve if an MT environment uses output from a variety of MT systems working on the same text. In the latest version of the Pangloss MT project, we collect the results of three translation engines typically, subsentential chunks in a chart data structure. Since the individual MT systems operate completely independently, their results may be incomplete, conflicting, or redundant. We use simple scoring heuristics to estimate the quality of each chunk, and find the highest-score sequence of chunks (the "best cover"). This paper describes in detail the combining method, presenting the algorithm and illustrations of its progress on one of many actual translations it has produced. It uses dynamic programming to efficiently compare weighted averages of sets of adjacent scored component translations. The current system operates primarily in a human-aided MT mode. The translation delivery system and its associated post-editing aide are briefly described, as is an initial evaluation of the usefulness of this method. Individual M T engines will be reported separately and are not, therefore, described in detail here. 95 1 I N T R O D U C T I O N Current MT systems, whatever translation method they employ, do not reach an optimal output on free text. In part, this is due to the inherent problems of a particular method for instance, the inability of statistics-based MT to take into account longdistance dependencies, the difficulty in achieving extremely broad coverage in knowledge-based MT systems, or the reliance of most transfer-oriented MT systems on similarities in syntactic structures of the source and the target languages. Our hypothesis is that if an MT environment can use the best results from a variety of MT systems working simultaneously on the same text, the overall quality will improve. Using this novel approach to MT in the latest version of the Pangloss MT project, we submit an input text to a bat tery of machine translation systems (engines), collect their (possibly, incomplete) results in a joint chart data structure and select the overall best translation using a set of simple heuristics. 2 I N T E G R A T I N G MULTI-ENGINE O U T P U T In our experiment we used three MT engines: * a knowledge-based MT (KBMT) system, the mainline Pangloss engine (Frederking et al., 1993b); • an example-based MT (EBMT) system (see (Nirenburg et al., 1993; Nirenburg et al., 1994b); the original idea is due to Nagao (Nagao, 1984)); and • a lexical transfer system, fortified with morphological analysis and synthesis modules and relying on a number of databases a machine-readable dictionary (the Collins Spanish/English), the lexicons used by the KBMT modules, a large set of user-generated bilingual glossaries as well as a gazetteer and a list of proper and organization names. The outputs from these engines (target language words and phrases) are recorded in a chart whose positions correspond to words in the source language input. As a result of the operation of each of the MT engines, new edges are added to the chart, each labeled with the translation of a region of the input string and indexed by this region's beginning and end positions. We will refer to all of these edges as components (as in "components of the translation") for the remainder of this article. The KBMT and EBMT engines also carry a quality score for each output element. The KBMT scores are produced based on whether any questionable heuristics were used in the source analysis or target generation. The EBMT scores are produced using a technique based on human judgements, as described in (Nirenburg et al., 1994a), submitted.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimally interacting minds.

In everyday life, many people believe that two heads are better than one. Our ability to solve problems together appears to be fundamental to the current dominance and future survival of the human species. But are two heads really better than one? We addressed this question in the context of a collective low-level perceptual decision-making task. For two observers of nearly equal visual sensiti...

متن کامل

بررسی عملکرد پوشش‌های گراولی و مصنوعی در زهکش‌های زیرزمینی

In this research, the hydraulic behavior of two kinds of envelopes including synthetic envelope, PP450 and gravel envelope with USBR standard in two soil tank models with silty loam texture was investigated. Three water heads including 55, 75 and 105 cm (water logging) from drain level were used. The discharge of pipe drain in the steady state condition for gravel envelope and at 55, 75 and 105...

متن کامل

A Note On the Hierarchy of One-way Data-Independent Multi-Head Finite Automata

In this paper we deal with one-way multi-head data-independent finite automata. A k-head finite automaton A is data-independent, if the position of every head i after step t in the computation on an input w is a function that depends only on the length of the input w, on i and on t (i.e. the trajectories of heads must be the same on the inputs of the same length). It is known that k(k + 1)/2 + ...

متن کامل

Two Heads are Better than Two

We show that a Turing machine with two single-head one-dimensional tapes cannot recognize the set f x2x 0 j x 2 f0;1g and x 0 is a preex of x g in real time, although it can do so with three tapes, two two-dimensional tapes, or one two-head tape, or in linear time with just one tape. In particular, this settles the longstanding conjecture that a two-head Turing machine can recognize more langua...

متن کامل

MLCA: A Multi-Level Clustering Algorithm for Routing in Wireless Sensor Networks

Energy constraint is the biggest challenge in wireless sensor networks because the power supply of each sensor node is a battery that is not rechargeable or replaceable due to the applications of these networks. One of the successful methods for saving energy in these networks is clustering. It has caused that cluster-based routing algorithms are successful routing algorithm for these networks....

متن کامل

The Productive Agency that Drives Collaborative Learning

In over 60 years of research, there have been very few demonstrations that working in a small collaborative group yields cognitive outcomes that cannot be matched or exceeded by the most competent member of the group rule induction (Laughlin & Futoran, 1985). The research suggests that there is nothing particularly special about working in small groups, at least with regards to cognitive outcom...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994